LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos
نویسندگان
چکیده
Online educational videos have emerged as one of the most popular modes of learning in the recent years. Studies have shown that liveliness is highly correlated to engagement in educational videos. While previous work has focused on feature engineering to estimate liveliness and that too using only the acoustic information, in this paper we propose a technique called LIVELINET that combines audio and visual information to predict liveliness. First, a convolutional neural network is used to predict the visual setup, which in turn identifies the modalities (visual and/or audio) to be used for liveliness prediction. Second, we propose a novel method that uses multimodal deep recurrent neural networks to automatically estimate if an educational video is lively or not. On the StyleX dataset of 450 one-minute long educational video snippets, our approach shows an relative improvement of 7.6% and 1.9% compared to a multimodal baseline and a deep network baseline using only the audio information respectively.
منابع مشابه
Speech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملThe Optimization of Forecasting ATMs Cash Demand of Iran Banking Network Using LSTM Deep Recursive Neural Network
One of the problems of the banking system is cash demand forecasting for ATMs (Automated Teller Machine). The correct prediction can lead to the profitability of the banking system for the following reasons and it will satisfy the customers of this banking system. Accuracy in this prediction are the main goal of this research. If an ATM faces a shortage of cash, it will face the decline of bank...
متن کاملMEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos
Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...
متن کاملDetecting Interrogative Utterances with Recurrent Neural Networks
In this paper, we explore different neural network architectures that can predict if a speaker of a given utterance is asking a question or making a statement. We compare the outcomes of regularization methods that are popularly used to train deep neural networks and study how different context functions can affect the classification performance. We also compare the efficacy of gated activation...
متن کاملAction Recognition with Joint Attention on Multi-Level Deep Features
We propose a novel deep supervised neural network for the task of action recognition in videos, which implicitly takes advantage of visual tracking and shares the robustness of both deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In our method, a multi-branch model is proposed to suppress noise from background jitters. Specifically, we firstly extract multi-level dee...
متن کامل